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ABSTRACT 


Massive Open Online Courses (MOOCs) are a promising 
form of online education. However, the occurrence of aca- 
demic dishonesty has been threatening MOOC certificates’ 
effectiveness as a serious tool for recruiters and employ- 
ers. Recently, a large-scale study on the log traces from 
more than one hundred MOOCs created by Harvard and 
MIT has identified a specific cheating strategy viable in 
MOOCs: Copying Answers using Multiple Existences On- 
line (CAMEO). In essence, learners create several accounts 
on a MOOC platform, request assessment solutions via some 
of the accounts, and then submit these “harvested” solutions 
in their main account to receive credit. In our work, we repli- 
cate the CAMEO implementation and apply it to ten edX 
MOOCs created by the Delft University of Technology. Our 
results show that in those MOOCs, 1.9% of certificates were 
likely earned through CAMEO cheating, a number compa- 
rable to the fraction of cheating observed in Harvard and 
MIT MOOCs. 
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1. INTRODUCTION 


Cheating is generally defined as using dishonest means to 
gain an undeserved reward of ability or to get rid of an 
embarrassing situation [3]. Academic dishonesty is a type of 
cheating that occurs in relation to an academic exercise. It is 
a widespread occurrence across different levels and forms of 
education [4]. There are diverse cheating strategies adopted 
by students to implement academic dishonesty such as im- 
personation, bringing notes into the exam hall, using an 
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unauthorized digital device, and so on. 


MOOCs, which are courses designed with open access for 
a large number of online participants, have become a vital 
part of scalable and large-scale education. However, the 
effectiveness of MOOCs has been threatened by academic 
dishonesty. For instance, as early as 2012, some instructors 
have voiced concerns about various forms of cheating in their 


MOOCs [7]. 


One of the main issues in exploring the issue of cheating in 
MOOCs is the general lack of ground truth data — MOOC 
providers may be reluctant to confront learners (as a def- 
inite proof of cheating is difficult to come by and a time- 
consuming endeavour) and MOOC learners are reluctant to 
admit their misbehaviour. Recently, Northcutt et al. [5] pro- 
posed a first approach to automatically detect a particular 
kind of cheating purely based on the log data that is col- 
lected in major MOOC platforms; they termed this method 
CAMEO or Copying Answers using Multiple Existence On- 
line. In brief, this method is able to detect learners that 
cheat in the following way: (1) A learner registers multi- 
ple accounts on a MOOC platform and enrolls in a MOOC 
of interest with all these accounts; one of those registered 
accounts is the learner’s main account. (2) The learner 
uses some of the registered accounts to randomly submit an- 
swers to assessment questions (which in MOOCs are often 
multiple-choice or fill-in-the-blank questions to enable au- 
tomatic grading) as a way to harvest the correct solutions. 
This is made possible by a design decision of major MOOC 
platforms which allows learners to check their submitted so- 
lutions immediately after submission. (3) The learner then 
submits the harvested solutions through the main account, 
allowing the learner to successfully complete the course and 
earn a certificate. Commonly, achieving 60% (or a similar 
percentage) of all possible points is sufficient to receive a 
MOOC certificate. 


Among the many potential ways of cheating in MOOCs, 
CAMEO is of particular concern for a number of reasons: 
(1) the CAMEO cheating strategy can be performed by ev- 
ery learner individually, it does not require learners to col- 
laborate with others; (2) CAMEO cheating is efficient and 
easy to execute as it directly utilizes the solutions provided 
in a MOOC; and (3) CAMEO cheating can be applied across 
many different MOOCs, largely independent of the subject 
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or course level. 


Northcutt et al. [5] observed CAMEO cheating in 69 Cours- 
era MOOCs (out of 115 investigated) provided by MIT and 
Harvard University; among those 69, approximately 1.3% of 
the certificates were issued to learners identified as CAMEO 
users. Given that MOOCs provided by different universi- 
ties usually attract varying sets of learners, in this work, we 
investigate the following two Research Questions: 


RQ1 What is the prevalence of CAMEO cheating in the 
MOOCs provided by TU Delft? 


RQ2 What are characteristics of learners identified to have 
employed the CAMEO strategy? 


To answer these questions, we implement the detection ap- 
proach as described in [5] and apply it on the log traces 
of 10 edX MOOCs. We find that 1.9% of the certificates 
are earned by CAMEO learners (our answer to RQ1), with 
some types of MOOCs more prone to cheating than oth- 
ers. While we did not observe any CAMEO behaviour in a 
MOOC on political debates, we found more than 6% of cer- 
tificates to be CAMEO certificates in a business and tech- 
nical course respectively. With respect to RQ2, we observe 
cheating to be most prevalent mid-course and to be more 
prevalent in some user demographics than others. 


2. RELATED WORK 


There are a few works proposed to investigate the preva- 
lence of cheating in MOOCs. Two of the earliest works were 
proposed by [5] and [6]. Both of these two works focused 
on the detection of CAMEO cheating based on learnersiAZ 
traces in MOOCs provided by MIT on edX. 


In [5], 1.3% of the certificates among 69 MOOCs cover- 
ing different subjects were earned by learners who adopted 
CAMEO cheating strategies. Learners who applied CAMEO 
are more likely to be young, male and international than the 
other certified learners. In [6], the number is 10.3% of the 
certificates in an introductory physics MOOC. 


In both of these works, researchers set patterns of CAMEO 
and select learners whose behaviors satisfy the patterns. 
There are overlaps between the criteria adopted by the two 
works. Ruiperez-Valiente et al. [6] has relatively more de- 
tailed assumptions to CAMEO in different modes. North- 
cutt et al. [5] was conducted in more than 100 MOOCs, 
which helps to avoid the accidental bias in the prevalence of 
CAMEO caused by courses. 


Compared to these works, our goal is to investigate the 
prevalence of this cheating behavior in the MOOCs pro- 
vided by TU Delft and what the common characteristics 
are among the detected cheaters. 


3. DETECTION METHOD 


In this section we recap the main assumptions that under- 
pin Northcutt et al. [5]’s approach. Note that these assump- 
tions are derived from intuitions about MOOC learners’ 
(or more generally online users’) behaviours on the learning 
platform. Our implementation of the approach matches the 
original paper’s algorithmic formulation as closely as possi- 
ble. 
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e CAMEO users hold at least two accounts. Each 
CAMEO user (i.e. a learner who cheats to gain an ad- 
vantage in a MOOC) should use one or more accounts to 
harvest solutions (so-called Harvest Account(s)) and one 
main account to submit the correct solutions (i.e., the 
Master Account) so as to earn the certificate. Initially, 
every possible pair of user accounts having enrolled in a 
particular MOOC is a candidate Master/Harvester pair. 


e CAMEO users harvest solutions before entering 
them into their Master Account. In other words, for 
questions that learners cheat on, the candidate Harvester 
Account should precede the candidate Master Account in 
time for the gathering of solutions. 


e CAMEO users quickly pass collected solutions from 
Harvester Accounts to Master Account. It is rea- 
sonable to assume that a cheater may simultaneously log 
in both the Harvest Account and the Master Account, and 
once the learner collects the correct solutions, he may im- 
mediately submit the correct solutions through the Master 
Account. This assumption requires the time difference be- 
tween the correct submission from the candidate Master 
Account and the request to solutions from the candidate 
Harvester Account to be small. 


e Master Accounts are certified, the Harvester Ac- 
counts are not. Given that Harvester Accounts are 
mainly used to gather correct solutions via randomly sub- 
mitting answers, more often than not, the Harvester Ac- 
counts do not reach the passing threshold of a MOOC. At 
the same time, the Master Accounts should perform well 
in that respect and earn a certificate. 


e Master Account and Harvester Account are con- 
nected via IP addresses. As noted before, a CAMEO 
user may simultaneously log into multiple accounts on one 
and the same or different devices in the same location; 
thus, it is likely that Master and Harvester account share 
a common logged IP address during the MOOC. 


In the CAMEO approach, these intuitions are transformed 
into filtering rules (that filter the initially created account 
pairs) and only candidate Master/Harvester pairs that meet 
all of these criteria are considered to be CAMEO users, that 
is, learners who cheat through multiple account usage in a 
MOOC. Most of these rules contain ad-hoc parameters (e.g. 
the time limit between a Harvester and Master account sub- 
mission); we have followed the parameter settings described 
in [5] in our implementation. 


4. EXPERIMENT 


4.1 Dataset 

Our study is based on the log data generated during 10 edX 
MOOCs (eight different MOOCs of which two ran twice) 
which were provided by TU Delft between 2014 and 2016. 
The MOOCs cover various scientific areas including data sci- 
ence, programming paradigms, biotechnology, business and 
political science. An overview of the MOOCs, including the 
number of enrolled learners and the number of certificates 
earned is shown in Table 1. 
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Table 1: Overview of the ten MOOCs included in this study. #Enrollments shows the number of user accounts that 
registered for each MOOC and #Certificates lists the number of registered participants that achieved a certificate (the 
passing threshold is 50% for Framel01x and 60% for all other MOOCs). Note that FP101x and EX101x are listed twice, as 


they both ran in two different time periods. 


Course Code Course Title 


Session #Enrollments #Certificates 


FP101x Functional Programming 
CTB3365DWx Drinking Water Treatment 


EX101x Data Analysis 

Frame101x Framing: How Politicians Debate 
Calc001x Pre-university Calculus 

EX101x Data Analysis 


IBO1x Industrial Biotechnology 


FP101x Functional Programming 
RILO1x Responsible Innovation 
CTB3365sTx Urban Sewage Treatment 


Table 2: Overview of the detected CAMEO users and the 
percentage of certificates gained by CAMEO users. The last 
row shows the numbers across all ten MOOCs. 


Course Code  #CAMEO % CAMEO 


Users Certificates 
FP101x (2014) 13 0.96% 
CTB3365DWx 4 1.63% 
EX101x (2015S) 27 1.23% 
Framel01x 0 0 
Calc001x 13 3.63% 
EX101x (2015F) 20 1.73% 
IBO1x 12 3.65% 
FP101x (2015) 16 1.40% 
RIO1x 7 6.19% 
CTB3365sTx 25 6.93% 


Total 137 1.89% 


4.2 CAMEO Detection Results 

For each of the MOOCs, we present the number of detected 
CAMEO users (and subsequently the percentage of certifi- 
cates gained through CAMEO) in Table 2. CAMEO users 
are detected in 9 out of the 10 MOOCs and overall account 
for 137 (or 1.89%) of all certificates. This percentage is 
slightly higher than Northcutt et al. [5]’s (1.3%). The per- 
centages vary across courses, with Urban Sewage Treatment 
being the MOOC with the largest percentage of CAMEO 
learners, nearly 7%. On the other hand, our only MOOC 
without CAMEO cheating detected is Framing: How Politi- 
cians Debate. In future work we will investigate this variance 
in CAMEO between courses; we hypothesize that for par- 
ticipants in Framel01x a certificate has less intrinsic value 
(the self-development aspect is more important) and thus 
cheating is less likely to occur. 


4.3 Verification of CAMEO Users 


To explore how plausible the detection results are — i.e., are 
the detected account pairs actually belonging to the same 
learner and did the learner indeed cheat — we manually ver- 
ified key account characteristics. It is sensible for instance 
to assume that at least some CAMEO users register with 
the same/similar name across the Harvester and Master Ac- 
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2014 Fall 37,940 1,356 
2014 Fall 10,458 246 
2015 Spring 33,515 2,190 
2015 Spring 34,017 919 
2015 Summer 27,857 358 
2015 Fall 21,041 1,156 
2015 Fall 8,143 329 
2015 Fall 20,936 1,143 
2016 Spring 2,741 113 
2016 Spring 9,566 361 


count. Indeed, among our 137 detected CAMEO users, 20% 
have similar or even same registered full names attached 
to their Harvester and Master Accounts'. To provide the 
reader with some intuition on the similarities, we now de- 
scribe for a randomly picked CAMEO user in our dataset 
the similarities between the detected Master and Harvester 
Account: 


e The Harvester & Master Account have the same registered 
full name. 


e The registered email addresses of the Harvester & Master 
Account contain a common long character sequence (eight 
characters). 


e The Harvester & Master Account utilize the same IP ad- 
dress to answer every question. 


e The Harvester & Master Account submit answers within 
60 seconds for every harvested question and the Harvester 
Account always submits before the Master Account. 


e The Harvester Account submits answers for all questions 
in the course, but the correctness is only 11.5%. 


Based on these observations, we are highly confident that 
the learner is indeed a CAMEO user. 


4.4 Characteristics of CAMEO Users 

To gain a better understanding of the detected CAMEO 
users, we analyze their characteristics and patterns. With 
respect to the nationality of the certified learners, we find 
them to come mainly from the US, the Netherlands and the 
UK. However, the detected CAMEO users are mainly from 
India (27), the US (12) and Germany (7). 


We are also interested in the motivation of CAMEO cheaters, 
ie., what drives them to cheat in MOOCs. Intuitively, we 
believe that most CAMEO users to be strongly goal-oriented 
with the goal being the certificate (instead of the goal be- 
ing related to knowledge gains). To verify this intuition, 
we compute how many detected CAMEO users would be 


‘We compute the similarity between two account names ac- 
cording to the Ratcliff/Obershelp sequence match method 


[1]. 
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Table 3: Overview of the identified CAMEO learners and 
their certificate status (pass or fail) if the assessments points 
they gained through CAMEO were removed. 


Pass w/o Fail w/o 
eure Code pnts Canes 
FP101x (2014) 2 rm 
CTB3365DWx 0 4 
EX101x (2015S) 3 24 
Framel01x 0 0 
Calc001x 0 13 
EX101x (2015F) 4 16 
IBO1x 0 12 
FP101x (2015) 2 14 
RIO1x 1 6 
CTB3365sTx 0 25 
Total 12 125 


able to earn a certificate without CAMEO cheating. Specif- 
ically, we calculate the grades of CAMEO users on the con- 
dition that they only receive credits for questions they did 
not cheat on and evaluate whether the scores are sufficient 
to pass the course. As shown in Table 3, nearly 90% of the 
CAMEO users cannot pass the MOOCs without cheating, 
which implies that most of the CAMEO users are purely 
certificate-driven. 


We also investigate when CAMEO users are most likely to 
cheat during the course of a MOOC. To this end, we select 
FP101x (2014 and 2015) and EX101x (2015 Spring and 2015 
Fall) for analysis as the grading strategies adopted across the 
four MOOCs are very similar: almost all questions (more 
than 100 per course) are worth a single point and the final 
grade is simply based on the fraction of questions the learner 
answered correctly (with 60% of correct answers being the 
passing threshold). Figures 1 (FP101x) and 2 (EX101x) 
show the number of identified CAMEO users that resort to 
the CAMEO strategy across the different course weeks. 
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Figure 1: Average Number of CAMEO Cheater Cheating 
on per Question in Different Weeks in FP101x. 


Few learners resort to CAMEO in the first two weeks of the 
course, while course weeks 3, 4, 5 and 6 attract the most 
cheating. This is not overly surprising considering the fact 
that the questions in later weeks are usually more difficult 
than those in early weeks. The trend of decreased CAMEO 
in the final week(s) can be explained by the fact that the 
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Figure 2: Average Number of CAMEO Cheater Cheating 
on per Question in Different Weeks in EX101x. 


edX platform provides a Progress page where each learner 
can check his progress towards the passing threshold. For 
a learner whose main goal is the certificate, the realization 
of that goal (which can occur already as early as week 5 as 
the passing threshold is 60%) is likely to reduce or stop his 
CAMEO behaviour. 


5. CONCLUSION 

We successfully replicated the CAMEO strategy formalized 
in [5] and applied it to a novel set of MOOCs. Overall, we 
found similar percentages of CAMEO cheating in TU Delft 
MOOGs (1.9% vs. 1.3%), albeit with the limitation that we 
only explored 10 MOOCs (vs. 115 by MIT/Harvard). We 
are currently enlarging the study to include all 50 MOOCs 
that are provided by TU Delft. Our future work will place 
a greater emphasis on the demographic analysis of CAMEO 
users and on ways to reduce and prevent such cheating — 
either through technological means or ethical appeals and 
moral reminders [2]. 
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